Learning to Identify Arabic and German Dialects using Multiple Kernels

نویسندگان

  • Radu Tudor Ionescu
  • Andrei M. Butnaru
چکیده

We present a machine learning approach for the Arabic Dialect Identification (ADI) and the German Dialect Identification (GDI) Closed Shared Tasks of the DSL 2017 Challenge. The proposed approach combines several kernels using multiple kernel learning. While most of our kernels are based on character p-grams (also known as n-grams) extracted from speech transcripts, we also use a kernel based on i-vectors, a low-dimensional representation of audio recordings, provided only for the Arabic data. In the learning stage, we independently employ Kernel Discriminant Analysis (KDA) and Kernel Ridge Regression (KRR). Our approach is shallow and simple, but the empirical results obtained in the shared tasks prove that it achieves very good results. Indeed, we ranked on the first place in the ADI Shared Task with a weighted F1 score of 76.32% (4.62% above the second place) and on the fifth place in the GDI Shared Task with a weighted F1 score of 63.67% (2.57% below the first place).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Arabic Dialect Identification Using iVectors and ASR Transcripts

This paper presents the systems submitted by the MAZA team to the Arabic Dialect Identification (ADI) shared task at the VarDial Evaluation Campaign 2017. The goal of the task is to evaluate computational models to identify the dialect of Arabic utterances using both audio and text transcriptions. The ADI shared task dataset included Modern Standard Arabic (MSA) and four Arabic dialects: Egypti...

متن کامل

یادگیری نیمه نظارتی کرنل مرکب با استفاده از تکنیک‌های یادگیری معیار فاصله

Distance metric has a key role in many machine learning and computer vision algorithms so that choosing an appropriate distance metric has a direct effect on the performance of such algorithms. Recently, distance metric learning using labeled data or other available supervisory information has become a very active research area in machine learning applications. Studies in this area have shown t...

متن کامل

Identifying dialects with textual and acoustic cues

We describe several systems for identifying short samples of Arabic or SwissGerman dialects, which were prepared for the shared task of the 2017 DSLWorkshop (Zampieri et al., 2017). The Arabic data comprises both text and acoustic files, and our best run combined both. The SwissGerman data is text-only. Coincidently, our best runs achieved a accuracy of nearly 63% on both the Swiss-German and A...

متن کامل

Neural Network-Based Learning Kernel for Automatic Segmentation of Multiple Sclerosis Lesions on Magnetic Resonance Images

Background: Multiple Sclerosis (MS) is a degenerative disease of central nervous system. MS patients have some dead tissues in their brains called MS lesions. MRI is an imaging technique sensitive to soft tissues such as brain that shows MS lesions as hyper-intense or hypo-intense signals. Since manual segmentation of these lesions is a laborious and time consuming task, automatic segmentation ...

متن کامل

Investigating the challenges of teaching and learning Arabic in the high schools of Zabol County1

Purpose: This paper aims to evaluate the teaching and learning processes of the Arabic course in the high schools of Zabol County. Methodology: Descriptive-correlational method was applied as the research method and the statistical population was comprised of two groups – students and teachers of Arabic course. It had a practical aim and relies on the general hypothesis stating that the Arabic ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017